Using the Noah package (by Tobias Busch) allows you to create a portfolio without disclosing any TOP SECRET information. The Noah package creates pseudonyms with hilarious animal names!
As a freelancer, I work with sensitive information all the time. And I also know that my portfolio is very important in generating new leads. This becomes problematic when clients do not want to have their super-duper top-secret data shared with others. This breaks the trust and integrity of a freelancer if they just share the juice for the world to see.
Not a good idea!
But, you can mask the data easily with pseudonyms. This hides, the critical information while showcasing your findings for your portfolio.
Let’s take a look at how this works.
#install.packages(c("tidyverse", "noah"))
library(tidyverse)## Warning in as.POSIXlt.POSIXct(Sys.time()): unable to identify current timezone 'C':
## please set environment variable 'TZ'
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5 v purrr 0.3.4
## v tibble 3.1.6 v dplyr 1.0.8
## v tidyr 1.2.0 v stringr 1.4.0
## v readr 2.1.2 v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(noah)
library(plotly)##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
library(DT) We load up the tidyverse to help with tidy data. And the noah package to create an amazing animal mask for data. WE will also add ploty to make our ggplots a little more dynamic and the DT to create dynamic tables
df <- read.csv("Demo Data.csv", stringsAsFactors = FALSE)
datatable(
df,
options = list(pageLength = 5),
rownames = FALSE
)## Warning in instance$preRenderHook(instance): It seems your data is too big
## for client-side DataTables. You may consider server-side processing: https://
## rstudio.github.io/DT/server.html
Taking a glance at this sample eCommerce data, we can see that we have some Critical information.
These are some of the data points that clients may not want you to share with the world. And with that, you can say “Well I can’t put this in my portfolio, now what?”
WRONG
Just mask these data points. Easily!
The function we are going to use is pseudonymize() from the Noah package.
pseudonymize(df$sku)[1:10]## [1] "Secretive Crocodile" "Interesting Dodo" "Defeated Minnow"
## [4] "Handsomely Lynx" "Excellent Puma" "Third Jacana"
## [7] "Colorful Wombat" "Well-To-Do Wildcat" "Third Jacana"
## [10] "Flaky Alligator"
As you can see the SKU field has now magically turned into superhero animals (well if they had superhero names)
Next let’s clean up the data and put this in a data frame!
mask_df<-
df%>%
mutate(fake.sku = pseudonymize(sku), #adding the animal names
Sales = as.numeric(gsub("\\$","",product.sales)), #Cleaning up sales number
date = format(as.Date(df$date.time, "%b %d, %Y"), "%d %B %Y"), #Cleaning the date format
fees = as.numeric(gsub("\\$","", fba.fees)) +
as.numeric(gsub("\\$","",selling.fees)) +
as.numeric(gsub("\\$","",other.transaction.fees)), #Combining fee data together
profit = Sales + fees)%>% #Getting the profit
select(date,
sku,
fake.sku,
Sales,
fees,
profit,
order.city,
order.state) #Selecting only the necessary data
datatable(mask_df,
options = list(pageLength = 5),
rownames = FALSE)As you can see in the table above, we have now made an animal ask for our SKU data. Que the “What does the fox say song”. Each SKU gets its superhero (or supervillain) animal name. With no repeats.
This is perfect for hiding important information while allowing us to showcase our project for our portfolio.
Let’s take this one step further and make some charts!
p1<-
ggplot(mask_df%>%
group_by(sku,fake.sku)%>%
summarise(Sales = sum(Sales),
fees = sum(fees),
profit = Sales + fees,
margin = profit / Sales)%>%
arrange(desc(margin))%>%
head(10),
aes(x = fake.sku, y = profit))+
geom_col(fill = "#A38560")+
coord_flip()+
theme_bw()## `summarise()` has grouped output by 'sku'. You can override using the `.groups`
## argument.
ggplotly(p1)datatable(
mask_df%>%
group_by(sku,fake.sku)%>%
summarise(Sales = sum(Sales),
fees = sum(fees),
profit = Sales + fees,
margin = profit / Sales)%>%
arrange(desc(margin))%>%
head(10),
rownames = FALSE
)## `summarise()` has grouped output by 'sku'. You can override using the `.groups`
## argument.
In the plot above, we are looking at the top 10 SKUs based on profit margin but looking over the total profit each SKU has given the client. At the time of writing this, Capable Duck provides the most profits in our top 10 profit margin.
Now this chart has masked the important information. All we need to do is make a full portfolio and share it with the world. And we are not sharing any of the clients IP for the public to view.
Now when we share our portfolio, we can explain, share, and show what we did for previous clients with ease.